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Abstract 

Portfolio allocation with gross-exposure constraint is an effective method to increase the 
efficiency and sta bility of selected portfolios among a vast pool of assets, as demonstrated in 



Fan et al. 



(|20081j ). The required high-dimensional volatility matrix can be estimated by us- 
ing high frequency financial data. This enables us to better adapt to the local volatilities 
and local correlations among vast number of assets and to increase significantly the sample 
size for estimating the volatility matrix. This paper studies the volatility matrix estimation 
using high-dimensional high-frequency data from the perspective of portfolio selection. Specifi- 
cally, we propose the us e of "p airwise-refresh time" and "all-refresh time" methods proposed by 



Barndorff-Nielsen et al 



(|2008[ ) for estimation of vast covariance matrix and compare their mer- 
its in the portfolio selection. We also establish the concentration inequalities of the estimates, 
which guarantee desirable properties of the estimated volatility matrix in vast asset allocation 
with gross exposure constraints. Extensive numerical studies are made via carefully designed 
simulations. Comparing with the methods based on low frequency daily data, our methods can 
capture the most recent trend of the time varying volatility and correlation, hence provide more 
accurate guidance for the portfolio allocation in the next time period. The advantage of using 
high-frequency data is significant in our simulation and empirical studies, which consist of 50 
simulated assets and 30 constituent stocks of Dow Jones Industrial Average index. 
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1 Introduction 



The mean-variance efficient portfolio theory by iMarkowita (|l952l . Il959l ) has profound impact on 
modern finance. Yet, its apphcations to practical portfoho selection face a number of challenges. 
It is well known that the selected portfolios depend too sensitively on the expected future return s 



and volatility matrix (jKlein and Bawa 



1976; 



Best and Grauer . 



This leads to the puzzle postulated by iJagannathan and Mai feOOg) why no sh ort-sale portfolio 



1991 



Chopra and Ziembal . Il993l ). 



outperforms the efficient Markowicz portfolio. See also iDe Roon. et al.\ (120011 ) on the study of 
optimal no-short sale portfolio on emerging market. The sensitivity on the dependence can be 
effectively address ed by the intr oduction o f the c onstraint on the gross exposure of portifolios 



(Fan et al 



2008b|). In particular, 



Fan et al 



(j2008bl ) shows, with non-asymptotic inequalities, that 
for a range of gross exposure constraint parameters, the actual risk of an empirically selected 
optimal portfolio, the actual risk of the theoretically optimal portfolio, and the estimated risk of 
an empirically selected optimal portfolio are in fact close. The accuracy depends only on the gross 
exposure parameter and the maximum componentwise estimation error of expected returns and 
covariance matrix — there is little error accumulation effect. The results are demonstrated also 
by both simul ation and empirical studies. This gives not only a theoretical answer to the puzzle 
postulated by IJagannathan and Mai (|2003l ) but also paves a way for optimal portfolio selection in 
practice. 

The second challenge of the implementation of Markowitz's portfolio selection theory is the 
intrinsic difficulty of the estimation of the large volatility matrix. This is well doc umented in 
the statistics and econometrics literature even for the static large covariance matrix (I Johnstone , 



2001 



Bickel and Levina 



2008 



Fan, et al. 



2008a 



Lam and Fan! . |2009|; 



Rothman et al 



20091 ). The 



additional challenge comes from the time-varying nature of a large volatility matrix. For a short 
and medium holding period (one day or one week, say), the expected volatility matrix in the near 
future can be very different from the average of the expected volatility matrix over a long time 
horizon (the past one year, say). As a result, even if we know exactly the realized volatility matrix 
in the past, the bias can still be large. This calls for a stable and robust portfolio selection. The 
portfolio allocation under the gross exposure constraint provides a needed solution. To reduce 
the bias of the forecasted expected volatility matrix, we need to shorten the learning period to 
better capture the dynamics of the time-varying volatility matrix, adapting better to the local 
volatility and correlation. But this is at the expense of a reduced sample size. The wide availability 
of high-frequency data provides sufficient amount of data for reliable estimation of the volatility 
matrix. 

Recent years have seen dramatic developments in the study of high frequency data in integrated 
volatility. Statisticians and econometricians have been focusing on the interesting and challenging 
problem of volatility estimation in the presence of market microstructure noise and asynchronous 
tradings, which are the style features of high-frequency financial data. The progresses are very 
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impressive with a large literature. Assuming the pri ce processes follow Brownian s emini artingales 
to satisfy the no-arbitrage based characterizations (IDelbaen and Schachermayed . Il994l ). if there 
were no market microstructure noise, and if the processes are observed synchronously on grids that 
become denser, classical results in stochastic calculus show that the realized variance and realized 
covariance are consistent esti mators of the quadratic varia tion and quadratic co- variation of two 
price processes; see for example iKaratzas and Shrevd (|2000|) andlJacod and ShirvaevI (j2003l ). When 



Andersen et al. 



(j20o3) show that the 



directly applied to high-frequency financial data, however 
realized variance exhibits a la rge po s itive bias when the sampling frequency gets higher, through 
their famous signature plots; lEppa (jl979l ) documented that the correlation estimates based on 
the realized covariances tend to be biased toward zero when sampled at high frequencies. The 
recent developments have enabled us to understand much better the signature plots and Epps 
effect. Analytical explanations of how the market microstructure noise and asynchronization may 
affect the estimates and ways to correct for the biases have been given. In particular, in the 



one d imensional case when the focus is on estimation of in tegrated volatility, lAi't-Sahalia. et al. 



(|2005l ) discussed a subsampling scheme : IZhang. et al\ (j2005l ) pro posed a two-sca l e estim ate which 
was extended and improved by IZhangl (j2006l ) to multiple scales; iFan and Wand (j2007l ) separated 
jumps from diffusions in presenc e of market microstruc tural noise using a wavelet method; the 
robustness issues are addre ssed bvlLi and Mvklan3 J2OO7I'): the realized kerne l methods are proposed 
and thoroughly studied in iBarndorff-Nielsen et aU J2009al lb[): [ Jacod. et al\ (I2OO9) proposed a pre- 
averaging approach to reduce the market microstructral noise; Xiu ( 20081 ) demonstrated that a 
simple quasi-likelihood method achieves the optimal rate of convergence for estimating integrated 
volatility. F or estimation of integ r ated covariation, the non-synchronized trading issue was first 



addressed bv lHavashi and Yoshidal (|2005l ) in absence of the microstruc t ural n o ise; the kernel method 



with refresh time idea was first proposed by iBarndorff-Nielsen et al\ ( 2008 ): Zhang ( 20091) extend 
the tw o-scale method to study the integrated covariation using a previous tick method: IWang. et al. 



((20091) aggregate daily integrated volatility m atrix via a factor model; lAit-Sahalia. et al\ (j20ld ) 



Kinnebrock et al. 



([2009!) extend the pre-averaging 



extend the quasi-maximum likelihood method; 
technique. 

The aim of this paper is to study the volatility matrix estimation using high-dimensional high- 
frequency data from the perspective of financial engineering. Specifically, our main topic is how 
to extract the covariation information from high-frequency data for asset allocation and how effec- 
tive they are. Two particular strategies are proposed for handling the non-synchronized trading: 
"pairwise-refresh" and "all-refresh" schemes. The former retains much more data points and esti- 
mates covariance matrix componentwise, which is usually not semi-positive definite, whereas the 
latter retains far less data points and the resulting covariance matrix is usually semi-positive defi- 
nite. As a result, the former has a better componentwise estimation error and is better in controlling 
risk approximation mentioned in the first paragraph of the introduction. However, the merits be- 
tween the two methods are not that simple. In implementation, quadratic programming algorithms 
require the estimated covariance matrix to be semi-positive definite. Therefore, we need to project 
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the estimate of covariance matrix based on the "pairwise-refresh" scheme onto the space of the 
semi-positive definite matrices. However, the projections distort the accuracy of the elementwise 
estimation. As a result, the pairwise-refresh scheme does not have much more advantage than 
the ah-refresh method, though the former is very easy to implement. However, both methods 
significantly outperform the methods based on low frequency data, since they adapt better to the 
time- varying volatilities and correlations. The comparative advantage is more dramatic when there 
are rapid changes of the volatility matrix over time. This will be demonstrated in both simulation 
and empirical studies. 

As mentioned in the introduction and demonstrated in Section [2l the accuracy of portfolio risk 
relative to the theoretically optimal portfolio is governed by the maximum elementwise estimation 
error. How does this error grow with the number of assets? Thanks to the concentration inequalities 
derived in this paper, it grows only at the logarithmic order of the number of assets. This gives a 
theoretical endorsement why the portfolio selection problem is feasible for vast portfolios. 

The paper is organized as follows. Section [2] gives an overview of portfolio allocation using 
high-frequency data. Section [3] studies the volatility matrix estimation using high-frequency data 
from the perspective of asset allocation, where the analytical results are also presented. How well 
our idea works in simulation and empirical studies can be found in Sections [3] and HI respectively. 
Conclusions are given in Section [5j Technical conditions and proofs are relegated to the appendix. 



2 Constrained Portfolio Optimization with High Frequency Data 
2.1 Problem Setup 

Consider a pool of p assets, with log-price processes ••• Denote by X, = 

ixi^\--- ,xi^^f the vector of the lo g-price processes at time s. Suppose they follow an Ito 
process, namely, 

dXt = fx^dt + Sy^dWt (1) 

where Wt is the vector of p-dimensional standard Brownian motions. The drift vector /x^ and 
the instantaneous variance St can be stochastic processes and are assumed to be bounded and 
independent of W^. 

A given portfolio with the allocation vector w at time t and a holding period r has the log-return 
w"^ /t*"*^^ dXg with variance (risk) 

Rt,r{^) = W^Et^^W, (2) 

where = 1 and 

rt+T 

St,, = / EtSudu (3) 
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with Et denoting the conditional expectation given the history up to time t. Let be the propotion 
of long positions and w~~ be the proposition of the short positions. Then, ||w||i = w+ + w~ is the 
gros s exposure of the portfolio. To simplify the problem, following iJagannathan and Mai (j2003l ) 
and 



Fan et al. 



(|2008bl ) and other papers in the literature, we consider only the risk optimization 
problem. In practice, the expected return constraint can be replaced by the constraints of sectors 
or industries, to avoid unreliable estimates of the exp ected retu r n vecto r. For a short-time horizon. 



the expected return is usually negligible. Following iFan et al\ (j2008bl ). we consider the following 
risk optimization under gross exposure constraints: 

minw^St^^w, s.t.||w||i < c and = 1, (4) 

where c is the total exposure allowed. Note that using — = 1, the problem (jH) puts 
equivalently the constraint on the proportion of the short positions: w~ < (c — l)/2. 

Problem @ involves the conditional expected volatility matrix ([3|) in the future. Unless we 
know exactly the dynamic of the volatility process, this is usually unknown, even if we observed 
the entire continuous paths up to the current time t. As a result, we rely on the approximation 
even with ideal data that we were able to observe the processes continuously without error. The 
typical approximation is 

T-^-Et^r « h-^ [ S„dn, (5) 

Jt-h 

for an appropriate window width h and we estimate Sudu based on the historical data at the 
time interval [t — h,t]. 

The approximation ([5]) holds reasonably well when r and h are both small. This relies on the 
continuity assumptions: local time- varying volatility matrices are continuous in r. The approxi- 
mation is also reasonable when both r and h are large. This relies on the stationarity assumption 
so that both quantity will be approximately ESu, when the stochastic volatility matrix is sta- 
tionary. The approximation is not good when r is small whereas h is large as long time 
varying, whether or not the stochastic volatility is stationary or not. In other words, when the 
holding time horizon r is short, as long time varying, we can only use a short time window 

[t — h,t] to estimate St,r- The recent arrivals of high-frequency data make this problem feasible. 

The approximation error in ([5]) can not usually be evaluated unless we have a specific parametric 
model on the stochastic volatility matrix S^. However, this is at the risk of model misspecifications 
and nonparametric approach is usually preferred for high-frequency data. With p'^ elements are 
approximated, which can be in the order of hundred of thousands or millions, a natural question to 
ask is whether these errors accumulate and whether the resul t (risk) is s table. The gross-exposure 



constraint gives a stable solution to the problem as shown in lFan et al\ (j2008bl ). 
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2.2 Risk approximations with gross exposure constraints 



The utility of gross-exposure constraint can easily be seen through the following inequality. Let 
'St^T be an estimated covariance matrix and 

Rt,rM = W^Et^^W (6) 

be estimated risk of the portfolio. Then, for any portfolio with gross-exposure ||w||i < c, we have 

p p 

|-Ri,T(w) - i?t,^(w)| < y^y^l^ij - cri,j\\wi\\wj\ 

i=i j=i 

< |Si,T — ^t.rloo ||w||^ 

< \'^t,T — ^t,T\ooc'^ , (7) 

where aij and aij are respectively the elements of Sj^,- and Tit^r, and 

\'^t,T ~ Stjrloo = max |(Tjj — CTjjl 

is the maximum componentwise estimation error. The risk approximation ([7]) reveals that there is 
no error accumulation effect when gross exposure c is moderate. 

From now on, we drop the dependence of t and r whenever there is no confusion. This facilitates 
the notation. 



Fan et al 



(|2008bl ) showed further that the risks of optimal portfolios are indeed close. Let 



Wopt = argmin^Ti^i^ ||w||i<c^(w), "^opt = argmin^Ti=i_ ||w||i<c^(w) (8) 

be respectively the theoretical (oracle) optimal allocation vector we want and the estimated optimal 
allocation vector we get. Then, R(wopt) is the theoretical minimum risk and R{wopt) is the actual 
risk of our selected portfolio, whereas R{wopt) is our perceived risk, which is the quantity known 
to financial econometricians. They showed that 

\R{wopt) - R{wopt)\ < 2apc^, (9) 
\R{wopt) - R{^opt)\ < apc\ (10) 
l^(wopt) - -R(wopt)| < apC^. (11) 

with Op = |S — S|oo, which usually grows slowly with the number of assets p. These reveal that 
the three relevant risks are in fact close as long as the gross-exposure parameter c is moderate and 
the maximum componentwise estimation error Up is small. 

The above risk approximations hold for any estimate of covariance matrix. It does not even 
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require S a semi-positive definite matrix. This facilitates significantly the method of covariance 
matrix estimation. In particular, the elementwise estimation methods are allowed. In fact, since the 
approximation errors in ([9]) , (jlOp and (|lip are all controlled by the maximum elementwise estimation 
error, it can be advantageous to use elementwise estimation methods. This is particularly the case 
for the high-frequency data where trading are non-synchronized. The synchronization can be done 
pairwisely or for all assets. The former retains much more data than the latter, as shown in the 
next section. 



3 Estimation of Covariance Matrix Using High Frequency Data 



3.1 All-refresh method and pairwise-refresh method 



Estimating high-dimensional volatility matrix using high-frequency data is a challenging task. One 
of the challenges is the non-synchronicity of trad ing. Several synchronization s chemes have been 



proposed. The refresh time metho d is pr oposed in iBarndorff-Nielsen et al\ (j2008l ) and the previous 
tick method is proposed in IZhang (j2009l ). The former uses more efficiently the available data and 
will be used in this paper. 

The idea of refresh time is to wait until all assets are traded at least once at time vi (say) and 
then use the last price traded before or at vi of each asset as its price at time vi . This obtains one 
synchronized price vector at time vi. The clock now starts again. Wait until all assets are traded at 
least once at time V2 (say) and again use the previous tick price of each asset as its price at time V2- 
This yields the second synchronized price vector at time V2- Repeat the process until all available 
trading data are synchronized. Clearly, the process discards a large portion of the available trades: 
After each synchronization, we always wait until the slowest stock to trade once. But this is the 
most efficient synchronization scheme. We will refer this synchorization scheme as the " all-refresh 



time" (The method is called all-refresh method for short). iBarndorff-Nielsen et al\ (j2008l ) advocate 
the kernel method to estimate integrated volatility matrix after synchronization, but this can also 
be done by using other methods. The advantage of the all-refresh method is that the estimated 
covariance matrix can be made semi-positive definite. 

A more efficient method to use the available sample is the pairwise refresh time scheme, which 
synchronizes the trading for each pair of assets separately (The method is called pairwise-refresh 
method for short). This retains far more data points, but we have to estimate the covariance matrix 
elementwise. The resulting covariance matrix is not necessarily semi-positive definite. Thanks to 
the gross exposure constraint, this is still applicable to the portfolio selection problems, as long as 
the elementwise estimation error is small. See ([7]) - (|lip . The pairwise-refresh scheme makes far 
more efficient use of the rich information in high-frequency data, and enables us to estimate each 
element in the volatility matrix more precisely, which helps improve the efficiency of the selected 
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portfolio. We will study the merits of these two methods. 



The pairwise estimation method allows us to use a weal th of univariate int egrated volatility 
estimators, such as the two -scale realize d volatility (TSRV) (IZhang. et all \200ai). th e multi-scale 



realized volatili ty (MSRV) (IZhand. 12006). the wavelet method (jPan and Wangl . 



kernel method (Barndorff-Nielsen et al. 



2007). the realized 



2009 , the pre-averaging approach (jjacod. et al\ . l2009l ) 
and the QMLE method ( Xiul . l2008l ) . For any given two assets with log-price processes andXp\ 
with pairwise-refresh times, the synchronized prices of Xl''+XY' and XI'' -X^-" can be computed. 
With the univariate estimate of the integrated volatilities < X^*) + X^^^ > and < X^^^ — X^^^ >, 
the integrated covariantion can be estimated as 



(12) 



In particular, the diagonal elements are estimated by the method itself. When , the TSRV is used, 
this results in the two-scale realized covariance (TSCV) estimate (jZhaneJ . |2009| ). 



3.2 Pairwise refresh method and TSCV 

We now focus on the pairwise estimation method. To facilitate the notation, we reintroduce it. 
We consider two log price processes X and Y that satisfy 

dXt = fii^Ut + afUB^^^ and dYt = ^if'^dt + aPdB^^\ (13) 

(X\ (Y) (X Y) 

where cor{Bl ,Bl ) = Pl ' ■ For the two processes X and Y, consider the problem of estimating 
{X, Y)t with T = 1. Denote by Tn the observation times of X and Sm the observation times of Y. 
Denote the elements in Tn and Sm by {T„^j}f^Q and {9m,i}i^Q respectively, in an ascending order 
{Tn,o and 9m,o are set to be 0). We assume that the actual log-prices are not observable, but are 
observed with microstructure noises: 

X° =Xr .+ef, and Yn" =Ye . + ej (14) 

where X° and Y° are the observed transaction prices in the logarithmic scale, and X and Y are 
the latent log prices govern by the stochastic dynamics (fT3]) . We assume that the microstructure 
noise ef and eJ processes are independent of the X and Y processes and that 

ef ^^.^.d.N{Q,l^\) and ef Ar(0, ryf ). (15) 

Note that this assumption is mainly for the simplicity of presentation; as we can see from the proof, 
one can for example easily replace the Gaussian assumption with the sub-Gaussian assumption 
without affecting our results. 
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The pairwise refresh time V = {vq, f i, • • • , Vn} can be obtained by setting vq = 0, and 

Vi = max { min{r £ Tn '■ t > mm{6 G Sm ■ > I'j-i}}, 

where n is the total number of refresh times in the interval (0, 1]. The actual sample times for the 
two individual processes X and Y that correspond to the refresh times are 

ti = max{r G 7^ : r < Vi} and Sj = max{6 E Sm '■ & < v-i}, 

which is really the previous-tick measurement. 

We study the property of the TSCV based on the asynchronous data: 

(X^), = [X°,Y°]\^^ - "^[X" ,Y°]\''\ (16) 

nj 

where 

n 
i=K 



and riK = {h — K + 1)/K. As discussed in IZhana (j2009l ). the optimal choice of K has order 



K = 0(n^/^), J can be taken to be a constant such as 1. In the following analysis, we consider the 
specific case when 

J = 1 (or nj = n) and uk = 0{h^^^). 
When either the microstructure error or the asy nchroii i city e xists, the realized covariance is 



seriously biased. An asymptotic normality result in IZhang] (120091 ) reveals that TSCV can simul- 
taneously remove the bias due to the microstructure error and the bias due to the asynchronicity. 
However, this result is not adequate for our application to the vast volatility matrix estimation. The 
maximum componentwise estimation error Op depends on the number of assets p. To understand 
its impact on Op, we need to establish the concentration inequality. In particular, for a sufficiently 
large |x| = 0((logp)"), if 

IneLxP{^/n\aij - aij\ > x} < exp(-Cx^/''), (17) 
for two positive constants a and C, then 

ap = \^- = Op (^^^^) • (18) 



We will show in the next section that the result indeed holds with a = 1/2 and n replaced by 
the minimum subsample size. Hence the impact of the number of assets is limited, only of the 
logarithmic order. 
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3.3 Concentration Inequalities 



Inequality ()17p requires the conditions on both diagonal elements and off-diagonal elements. Tech- 
nically, they are treated differently. For the diagonal cases, the problem corresponds to the estima- 
tion of integrated v olatility and there is no issue of asynchronicity. TSCV HfTU^i reduces to TSRV 
(jZhang. et all l2005l ) , which is explicitly given by 



= [X°,X°Yi> - -JL[x",XX\ (19) 



where 



i=K 

and fiK = {n — K + 1)/K. 



As shown in 



Zhang, et al\ ( 20051 ). the optimal choice of K has order K = 0{'n?/^) and J can be 



taken to be a constant such as 1. Again, for the TSRV, in the following analysis, we will consider 
the specific case when J = 1 (or nj = n) and nx = 0{'n}/^). 

To facilitate the reading, we relegate the technical conditions and proofs to the appendix. 
The following two results establish the concentration inequalities for the integrated volatility and 
integrated covariation. 



Theorem 1. Let X process he as in I113\) . and n be the total number of observations for the X 
process during the time interval (0,1]. Under Conditions 1-4 in section \A.l[ for x E [0,cn"'^/^], 

P\n^'^\(x^)^ - af^'^dt\ > x} < 4exp{-Ca;2} 

for positive constants c and C . A set of candidate values for c and C are given in |^^?[ ) for the case 
when the TSRV parameters are chosen according to Condition 5. 

Theorem 2. Let X and Y processes be as in \1^) . and n be the total number of refresh times 
for the processes X and Y during time interval (0,1]. Under Conditions 1-5 in section lA.ll for 
X £ [0,cni/6], 

P{h^/^\{X,Y)^ - / 4^^aPpi^'^Ut\ >x}< 8exp{-C7x2} 
Jo 



for positive constants c and C. A set of candidate values for c and C are given in I15S\) for the case 
when the TSCV parameters are chosen according to Condition 5. 
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3.4 Error rates on risk approximations 



Having had the above concentration inequahties, we can now readily give an upper bound of the 
risk approximations. Consider the p log-price processes as in Section 2.1. Suppose the processes are 
observed with the market microstructure noises. Let n^*'-^) be the observation frequency obtained 
by the pairwise-refresh method for two processes X^*) and X^^^ and n^, be the observation frequency 
obtained by the all-refresh method. Clearly, n^*'-'^ is typically much larger than n*. Hence, most 
elements are estimated more accurately using the pairwise-refresh method than using the all-refresh 
method. On the other hand, for less liquidly traded pairs, its observation frequency of pairwise- 
refresh time can not be an order of magnitude larger than h^:. 



Using (jlSp . an application to Theorems 1 and 2 to each element in the estimated integrated 
covariance matrix yields 

^pairwise-refresh ^ _ ^ ( , (20) 

V ^min / 

where nmin = minj j- n^*'-') be the minimum number of observations of the pairwise-refresh time. 

Note that based on our proofs which don't rely on particular properties of pairwise-refresh 
times, our results of Theorem 1 and Theorem 2 are applicable to all-refresh method as well, with 
the observation frequency of the pairwise-refresh times replaced by that of the all-refresh times. 
Hence, using the all-refresh time scheme, we have 



^all-refresh ^ j |.all-refresh _ ^ ^ \ (2I) 



Clearly, nmin is larger than h^:. See Figure [5J Hence, the pairwise refresh method gives a somewhat 
more accurate estimate in terms of the maximum elementwise estimation error. 



3.5 Projections of estimated volatility matrices 

The risk approximations ([9])- (|lip hold for any solutions to ([8]) whether the matrix S is positive 
semi-definite or not. However, convex optimization algorithms typically require the positive semi- 
definiteness of the matrix S. Yet, the estimates based on the elementwise estimation sometimes 
can not satisfy this and even the one from all-refresh method can have the same problem if TSRV 
is used. This leads to the issue of how to project a symmetric matrix onto the space of positive 
semi-definite matrices. 

There are two intuitive methods for projecting a p xp symmetric matrix A onto the space of pos- 
itive semi-definite matrices. Consider the singular value decomposition: A = r^diag(Ai, • • • , Xp)T, 
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where T is an orthogonal matrix, consisting of p eigenvectors. The two intuitive appeahng projec- 
tion methods are 



where A";^^ is the negative part of the minimum eigenvalue. For both projection methods, the 
eigenvectors remain the same as those of A. When A is positive semi-definite matrix, we have 
obviously that Ai = A2 = A. 

In applications, we apply the above transformations to the estimated correlation matrix A 
rather than directly to the volatility matrix estimate XI. The correlation matrix A has diagonal 
elements of 1. The resulting matrix under the projection method ()23p apparently still satisfies this 
property, whereas the one under the projection method (f22l) does not. As a result, the projection 
method ()23p keeps the integrated volatility of each asset intact. 

In the simulation and empirical studies, we applied both projections. It turns out that there 
is no significant difference between the two projection methods in terms of result. We decided to 
apply only the projection ([231) in all numerical studies, as it keeps the individual volatility estimate 
intact. 

3.6 Comparisons between pairwise-refresh and all-refresh methods 

The pairwise-refresh method keeps far richer information in the high-frequency data than the all- 
refresh method. See Figure [2j Thus, it is expected to estimate each element more precisely. Yet, 
the estimated correlation matrix is typically not positive semi-definite. As a result, projection (j23p 
can distort the accuracy of elementwise estimation. On the other hand, the all-refresh method 
is typically positive semi-definite or nearly so. The property (fT2l) typically entails the positive 
semi-definiteness property, as long as the volatility estimator for {X, X) is always nonnegative. For 
example, using the realized kernel method as the building block, the positive semi-definite version 
can easily be obtained. Therefore, the projection (I23p has less impact on the all-refresh method 
than on the pairwise-refresh method. 

Risk approximations (j9])- ()lip are only the upper bounds. The upper bounds are controlled 
by Op, which has rates of convergence govern by ()20p and (12ip . While the average number of 
observations of pairwise-refresh time is far larger than the number of observations of the all- 
refresh time, the minimum number of observations of pairwise-refresh time nmin is not much larger 
than n*. Therefore, the upper bounds (I20p and (I2ip are approximately of the same order. This 
together with the distortion due to projection do not leave much advantage for the pairwise-refresh 




(22) 



where is the positive part of Xj and 




(23) 
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method. 



4 Simulation Studies 

In this section, we simulate the market trading data using a reasonable stochastic model. As the 
latent prices and dynamics of simulations are known, our study on the risk profile is facilitated. 
It is a good tool to verify our theoretical results and to quantify the finite sample behaviors. In 
particular, we would like to demonstrate that high frequency data based approaches have a better 
risk profile than those based on the low frequency data. 

Throughout this paper, the risk is referring to the standard deviation of portfolio's returns. 
To avoid ambiguity, we call \J Riy^opt) the theoretical optimal risk or oracle risk, ^jRniy^opt) the 
perceived optimal risk, and \J R{wopt) the actual risk of the perceived optimal allocation. 



4.1 Design of Simulations 



A slightly modified version of the simulation model in iBarndorff-Nielsen et al\ (|2008l ) is used to 
generate the latent price processes of p traded assets. It is a multivariate factor model with 

(i) 

stochastic volatilities. Specifically, the latent log-prices XI ' follow 



(24) 



where the elements of B, W and Z are independent standard Brownian motions. The spot volatility 
obeys the independent Ornstein-Uhlenbeck processes: 



(25) 



where gf^ = log a^"^ and J7j*^ is an independent Brownian motion. The stationary distribution is 



given by /3q*\ [/3|;*^]^/(2a*^*^) j. The integrated quadratic variation and covariation are given by 



(Cj«)2ds + A»t, 



= / ^1 - (p«)2^1 - (pi^-))V«ai^-)d.. 
Jo 

The analytic formula for the conditional covariance matrix in ([3]) can be found for our model, 
but we decide not to report it for brevity. 



The number of assets p is taken to be 50. Slightly modified from iBarndorff- Nielsen et al\ (120081 ). 
the parameters is set to be (^(^), /3j*\ /J^, a^^), pW) = (O.OSx^, -x^*^ 0.75x^*\ -l/40xj\ -0.7) 
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Figure 1: The volatility and asset price processes of 10 simulated assets. 



where Xj is an independent realization form the uniform distribution on [0.7, 1.3]. The parameters 
are kept fixed during the simulations. In addition, A*^*) = exp(/3g*^), which makes the volatility 
matrix well conditioned. 

The model ()24p is used to generate the latent log-price values with initial values X^^ = 1 (log- 
price) and ^ from its stationary distribution. The Euler scheme is used to generate latent price 
at the frequency of once per second. To account for the market microstructure noise, the Gaussian 
noises ef"* ~ N{0,uj'^) with uj = 0.0005 are added. Therefore, like (fT4l) . the observed log-prices are 
X°^^^ = Xj*^ -|- e[^^ . To gain a sense of the extent to which the asset volatilities a^^^ and prices p/*^ 
(= exp(X°^*^)) vary through time, we plot demonstrative graphs of 10 assets' volatility and price 
processes over a year in Figure [H 

To model the non-synchronicity, p independent Poisson processes with intensitive parameters 
Ai,A2,--- , Ap are used to simulate the trading times of the assets. Motivated by the US equity 
trading dataset (the total number of seconds in a common trading day of the US equity is 23400), 
we set the trading intensity parameters Aj's to be 0.02i x 23400 for i = 1,2, ■ ■ ■ , 50, meaning that 
the average numbers of trading times for each asset are spread out in the arithmetic sequence of 
the interval [468,23400]. 



4.2 An oracle investment strategy and risk assessment 

An oracle investment strategy is usually a decent benchmark for other portfolio strategies to be 
compared with. There are several oracle strategies. The one we choose is to make portfolio 
allocation based on the covariance matrix estimated using latent prices at the finest grid (one per 
second) . Latent prices are the noise- free prices of each asset at every time points (one per second) , 



14 



which are unobservable in practice and is available to us only in the simulation. Therefore, for 
each asset, there are 23400 latent prices in a normal trading day. We will refer to the investment 
strategy based on the latent prices as the oracle or latent strategy. This strategy is not available 
for the empirical studies. 

The assessment of risk is based on the high-frequency data. For a given portfolio strategy, its 
risk is computed based on the latent prices at the finest grid (one per second) for the in-the-sample 
simulation studies; its risk is computed based on the latent prices at every 15 minutes for the 
out-of-sample simulation studies; whereas for the empirical studies, the observed prices at every 15 
minutes are used to assess its risk. This mitigates the influence of the microstructure noises. For 
the empirical study, we do not hold positions overnight therefore are immune to the overnight price 
jumps (we will discuss the details in Section E]). 

4.3 In-sample Risk Approximation and Optimal Allocation 

Based on the past h = 1 day, the latent prices (at the finest grid) based estimated TSCV covari- 
ance matrix (called latent covariance for short) is regarded as the true covariance matrix. There 
are several methods for estimating covariance matrix based on observed non-synchronized high- 
frequency data with microstructure noise. In particular, we employ all-refresh method based TSCV 
covariance matrix (called all-refresh TSCV covariance), all-refresh method based realized kernel co- 
variance matrix (called all-refresh RK covariance, for short), and pairwise-refresh method based 
TSCV covariance matrix (called pairwise-refresh TSCV covariance). The all-refresh RK covariance 
is included since it is positive semi-definite and there is no distortion effect due to projection. The 
latent covariance serves as the oracle covariance matrix from which the actual portfolio risk of any 
portfolio is computed. The conditioning number of the latent covariance of the p = 50 assets ranges 
from 192.27 to 226.46, with median 210.34, across 100 simulations. The medians of the minimum 
and maximum eigenvalues are respectively 0.0004 and 0.0838. For the all-refresh RK approach, 
the bandwidth of the realized kernel H is chosen to be 1, which gives the best risk profile in our 
numerical analysis. 

The efficiencies of using the rich high-frequency data between pairwise-refresh and all-refresh 
methods are contrasted. In particular, for each realization, we compute the median number of 
pairwise-refresh times medianjj(n(*'-')), the minimum number of pairwise-refresh times = 
minjj(fi(*'-^^) (see ([20]) ). and the number of all-refresh times (see ([2T]) ). The distributions of 
these three numbers are summarized in Figure [21 It is clear that the pairwise refresh scheme uses 
far more data on average, yet minimum number of pairwise-refresh time is not appreciably larger 
than that of refresh time. 

To gain insights on the risk approximations, we consider 4 specific portfolios of the p = 50 assets 
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Figure 2: The distributions (from left to right) of the median number of pairwise-refresh times, the 
minimum number of pairwise-refresh times, and the number of all-refresh times per day across 100 
simulations. 



with the weight vectors 

a„ — 1,21 61 1^T „, — ('^ J- b 1 b n n\T 

with 6 = 3. Their daily risks are computed based on various covariance estimators and are com- 
pared with the actual risk, which is computed based on the latent price. This is done across 100 
simulations. The medians, robust standard deviation (defined as interquartile range divided by 
1.35) and other characteristics are summarized in Table [TJ 

From the result, we can see that both the all-refresh TSRV and pairwise-refresh TSRV methods, 
especially all-refresh TSRV method, have a tendency to underestimate the risk in comparison with 
the latent risks, while all-refresh RK method has a tendency to overestimate the risk. In terms of 
the absolute risk diff^erence from the oracle, for 3 out of the 4 portfolios, pairwise-refresh TSRV 
method outperforms the all-refresh TSRV method. The same relationship can be observed when we 
turn to the Li norm of the absolute covariance difference (op) as well. The RK method outperforms 
the TSRV method. These are in line with our expectation. 

We now study the problem of the optimal portfolio allocation under gross exposure constraints. 
The optimal allocation vectors are computed based on the latent covariance, all-refresh TSCV 
covariance, all-refresh RK covariance and pairwise-refresh TSCV covariance and their actual risks 
are computed based on the latent covariance matrix. The medians of these actual risks against the 
gross exposure parameter c are depicted in Figure [3l 

Firstly, the all-refresh RK method outperforms the two TSRV methods when the gross exposure 
is below 3.7. The pairwise-refresh TSRV method outperforms the all-refresh TSRV method where 
the gross exposure is smaller than 1.2. That agrees with what we expected since the smaller the 
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Table 1: Risk approximation for p = 50 and n = 100 

We used the high frequency data for 100 independent trading days. The covariance of the 50 stocks is 
estimated according to various estimators. These estimated covariance matrices are used to compute the 
perceived risks of 4 portfohos. Relevant statistics are recorded. (All the characteristics are annualized.) 



Median and Robust Standard Deviation (RSD) of Risk 





Latent 


All-refresh TSRV 


AU-refrcsh RK 


Pairwise TSRV 


Portfolio 


Median(RSD) 


Mcdian(RSD) 


Median(RSD) 


Median(RSD) 


Wl 
W2 
Wi 
Wi 


0.4408 (0.0032) 
0.5916 (0.0060) 
0.5399 (0.0044) 
0.8442 (0.0077) 


0.3875 (0.1075) 
0.5229 (0.1259) 
0.4694 (0.0907) 
0.7531 (0.1748) 


0.4343 (0.0241) 
0.6230 (0.0258) 
0.5833 (0.0255) 
0.9228 (0.0418) 


0.4192 (0.0690) 
0.5936 (0.1285) 
0.5202 (0.0736) 
0.8390 (0.1789) 


Median and RSD of Absolute Risk Difference from the Orac 


le (Latent) 




All-refresh TSRV 


All-refresh RK 


Pairwise TSRV 




Portfolio 


Median(RSD) 


Mcdian(RSD) 


Median(RSD) 




Wl 
W2 
W3 
W4 


0.0889 (0.0769) 
0.1054 (0.0700) 
0.0936 (0.0665) 
0.1470 (0.1022) 


0.0183 (0.0153) 
0.0344 (0.0272) 
0.0437 (0.0300) 
0.0794 (0.0393) 


0.0547 (0.0439) 
0.0804 (0.0813) 
0.0599 (0.0593) 
0.1089 (0.0941) 




Median and RSD of Li 


Norm of Absolute Covariance Difference (op) 




AU-refrcsh TSRV 


All-refresh RK 


Pairwise TSRV 




Portfolio 


Mcdian(RSD) 


Median(RSD) 


Median(RSD) 






0.2476 (0.1460) 


0.0603 (0.0270) 


0.1730 (0.0746) 





gross exposure is, the tighter the bound ^ on the risk difference. It is obvious that the pairwise- 
refresh TSRV method gives an estimated covariance matrix with higher element-wise accuracy than 
the all-refresh TSRV method, therefore the former outperforms the latter where the bound is the 
tightest (gross exposure below 1.2) for this simulation design. 

Secondly, all the methods produce an upward-sloping risk curve up to some point and an almost 
flat curve beyond that (the curve is clipped). This is mainly due to the fact that we use only the 
intra-day data for 1 trading day, which does not result in sufficient amount of data to yield a stable 
estimate of the 50 x 50 covariance matrix. As the result, the estimated covariance matrix can 
be ill-conditioned. As c increases, the selected portfolios become increasingly unstable. When c 
reaches 5 or so, the selected portfolio becomes basically a randomly selected portfolio. Hence, their 
actual risks become larger and flat afterwards. 



4.4 Out-of-sample Optimal Allocation 

One of the main purposes of this paper is to investigate the comparative advantage of the high 
frequency based methods against the low frequency based method (especially in the context of 
portfolio investment). Hence, it is essential for us to run the following out-of-sample investment 
strategy test which includes both the high frequency and low frequency based approaches. More- 
over, since in the empirical studies, we do not know the latent asset prices, the out-of-sample test 
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Figure 3: The medians of the actual risks of the in-sample optimal allocations based on the high- 
frequency estimated covariance matrices using 1 trading day's intra-day data(p = 50, n = 100). 



should be designed so that it can also be conducted in the empirical studies. 

We simulate the prices of 50 traded assets using the model (f24l) and (f25ll with microstructure 
noise for the duration of 200 trading days (numbered as day 1, day 2, day 200) and record all the 
tick- by-tick trading times and trading prices of the assets. We assume that there are no overnight 
jumps for asset prices, meaning one trading day's closing price of an asset is always the same as 
the next trading day's opening price of that asset. 

We start investing 1 unit of capital into the pool of assets with low frequency and high frequency 
based strategies from day 101 (the portfolios are bought at the opening of day 101). For the low 
frequency strategy, we use the previous 100 trading days' daily closing prices to compute the sample 
covariance matrix and make the portfolio allocation accordingly with the gross exposure constraints. 
For the all-refresh high frequency strategies, we use the previous /i = 10 trading days' tick-by-tick 
trading data, use all-refresh time to synchronize the trades of the assets before applying realized 
kernel and TSCV to estimate the integrated volatility matrix and make the portfolio allocation, 
while for the pairwise-refresh high frequency strategy, we use pairwise-refresh times to synchronize 
each pair of assets and apply TSCV to estimate the integrated covariance for the corresponding 
pair. With the projection technique (j23p . the resulting TSCV integrated volatility matrix can 
always be transformed to a positive semi-definite matrix which facilitates the optimization. 

We run two investment strategies. In the first strategy, the portfolio is held for r = 1 trading 
day before we re-estimate the covariation structure and adjust the portfolio weights accordingly. 
The second strategy is the same as the first one except for the fact that the portfolio is held for 
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r = 5 trading days before rebalance. 

In the investment horizon (which is from day 101 to day 200 in this case), we record the 15- 
minute portfoho returns based on the latent prices of the assets, the variation of the portfolio 
weights across 50 assets, and other relevant characteristics. While it appears that 100 trading days 
is short, calculating 15- minute returns increases the size of the relevant data for computing the risk 
by a factor of 26. 

We study those portfolio features for a whole range of gross exposure constraint c from c = 1, 
which stands for the no-short-sale portfolio strategy, to c = 3. This is usually the relevant range of 
gross exposure for investment purpose. 

The standard deviations and other characteristics of the strategy for r = 1 are presented in 
Table [2] (the case r = 5 is very similar, therefore omitted). The standard deviations represent the 
actual risks of the strategy. As we only optimize the risk profile, we should not look significantly 
on the returns of the optimal portfolios. They can not even be estimated accurately with such a 
short investment horizon. Figures H] and [5] provides graphical details to these characteristics for 
both T = 1 and r = 5. 

Table 2: The out-of-sample performance of daily- rebalanced optimal portfolios with 
gross-exposure constraint 

We simulate one trial of intra-day trading data for 50 assets, make portfolio allocations for 100 trading 
days and rebalance daily. The standard deviations and other characteristics of these portfolios are recorded. 
All the characteristics are annualized (Max Weight: Median of maximum weights; Min Weight: Median of 
minimum weights; No. of Long: Median of numbers of long positions whose weights exceed 0.001; No. of 
Short: Median of numbers of short positions whose absolute weights exceed 0.001) 

Std Dev Mai Mhi No. of No. of 
Methods % Weight Weight Long Short 

Low Frequency Sample Covariance Matrix Estimator 
c = 1 (No short) 16J9 (U9 ^OiK) 13 
c = 2 16.44 0.14 -0.05 28.5 20 

c = 3 16.45 0.14 -0.05 28.5 20 

High Frequency All-Refresh TSRV Covariance Matrix Estimator 
c = 1 (No short) 16.08 (l20 -0.00 15 
c = 2 14.44 0.14 -0.05 30 19 

c = 3 14.44 0.14 -0.05 30 19 

High Frequency All-Refresh RK Covariance Matrix Estimator 
c = 1 (No sliort) 17.20 0^22 -0.00 12^5 
c = 2 20.35 0.22 -0.09 22 18 

c = 3 31.37 0.34 -0.19 23.5 23 

High Frequency Pairwise- Refresh TSRV Covariance Matrix Estimator 
c = 1 (No short) 15.34 OAS -0.00 15 
c = 2 12.72 0.13 -0.03 31 18 

c = 3 12.72 0.13 -0.03 31 18 



For both holding lengths r = 1 and r = 5, the all-refresh TSRV and pairwise-refresh TSRV 
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Figure 4: Out-of-sample performance of daily-rebalanced optimal portfolios based on high- 
frequency and low-frequency estimation of the integrated covariance matrix, (a) Annualized risk 
of portfolios, (b) Maximum weight of allocations. 
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Figure 5: Out-of-sample performance of optimal portfolios based on high-frequency and low- 
frequency estimation of the integrated covariance matrix with holding period r = 5. 



approaches outperform significantly the low frequency one in terms of risk profile for the whole 
range of the gross exposure constraint. This supports our theoretical results and intuitions. The 
shorter estimation window allows these 2 high frequency approaches to deliver consistently better 
results than the low frequency one. The low-frequency strategy outperforms significantly the equal- 
weight portfolio (see Figure H] and Figure [5|) . Slightly surprising is the fact that the low frequency 
approach also outperforms the all-refresh RK approach. We believe it must be due to the instability 
of the estimated realized kernel covariance matrix. 

All the risk curves attain their minimum around c = 1.2 (see Figure H] and Figure [5|), which falls 
into our expectation again, since that must be the point where the marginal increase in estimation 
error outpaces the marginal decrease in specification error. This, coupled with the result we get in 
the empirical studies section, will give us some guidelines about what gross exposure constraint to 
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use in investment practice. 

Firstly, the pairwise method outperforms the all-refresh method, as expected. Secondly, the 
risk of the low frequency approach only increases at a mild speed as the gross exposure constraint 
increases. A possible explanation is that only 50 assets is considered, therefore the estimation 
error accumulation effect is not dominating as badly as we were afraid it would be, given the low 
frequency covariance sampling window is the previous 100 trading days. Another possible reason 
could be that as the data is generated by a stationary stochastic model, the low frequency approach 
may be able to capture some of the stationarity within the model. 

In terms of portfolio weights, neither the low frequency nor the high frequency optimal no-short- 
sale portfolios are well diversified with all approaches assigning a concentrated weight of around 
20% to one individual asset. Their portfolio risks can be improved by relaxing the gross-exposure 
constraint (Figure H] and Figure [5]) . 

5 Empirical Studies 

The risk minimization problem ([6]) has important applications in asset allocation. We demonstrate 
its application in the stock portfolio investment in the 30 Dow Jones Industrial Average (DJIA) 
constituent stocks (will be called the 30 DJIA stocks for short). 

The Dow Jones Industrial Average is one of the several stock market indices created by Charles 
Dow, the editor of Wall Street Journal and a co-founder of Dow Jones and Company. It is an index 
that shows how 30 large, publicly-owned companies based in the United States have traded during 
a standard trading session in the stock market. We make the portfolio allocation to the constituents 
of the index as of Sep 30, 2008 (The individual components of the DJIA are occasionally changed 
as market conditions warrant.) 

To make asset allocation, we use the high frequency data of the 30 DJIA stocks from Jan 1, 
2008 to September 30, 2008. These stocks are highly liquid. The intensity of trading for each given 
trading day is summarized by the maximum, minimum and median number of trades among these 
30 stocks. The distributions of these summary statistics across those 9 months (189 trading days) 
are summarized in Figure O The period covers the birth of financial crisis in 2008. 

At the end of each holding period of r = 1 or r = 5 trading days in the investment period (from 
May 27, 2008 to Sep 30, 2008), the covariance of the 30 stocks is estimated according to various 
estimators. They are the sample covariance of the last 100 trading days' daily return data (low- 
frequency), the all-refresh TSCV estimator of the last 10 trading days, the all-refresh RK estimator 
of the last 10 trading days (the bandwidth of the realized kernel H is chosen to be 1 since the 
risk profile for H = 1 outperforms other alternative choices of H), and the pairwise-refresh TSCV 
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Figure 6: The distributions (from left to right) of the maximum, minimum and median number of 
trades of the 30 DJIA stocks per day, from Jan 02, 2008 to Sep 30, 2008 (189 trading days). 
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Figure 7: Out-of-sample performance of daily-rebalanced optimal portfolios for Dow Jones 30 
constituent stocks with investment period from May 27, 2008 to Sep 30, 2008 (89 trading days), 
(a) Annualized risk of portfolios, (b) Maximum weight of allocations. 



estimator of the last 10 trading days. These estimated covariance matrices are used to construct 
optimal portfolios with various exposure constraints. For r = 5, we do not count the overnight risks 
of the portifolio. The reason that the overnight price jumps are often due to the arrival of news and 
are irrelevant of the topic of our studies. The standard deviations and other characteristics of these 
portfolio returns for r = 1 are presented in Table [3] together with the characteristics of an equally 
weighted portfolio of the 30 DJIA stocks rebalanced daily. The standard deviations represent the 
actual risks. The risk is computed based on the 15 minutes returns. Figure [7| and Figure [8] provide 
the graphical details to these characteristics for both r = 1 and r = 5. 

Table [3l Figures [7] and [8] reveal that in terms of the portfolio's actual risk, the all-refresh TSRV 
and pairwise-refresh TSRV strategies perform at least as well as the low frequency based strategy 
when the gross exposure is small and outperform the latter significantly when the gross exposure 
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Table 3: The out-of-sample performance of daily-rebalanced optimal portfolios of the 
30 DJIA stocks 



Std Dev Max Min No. of No. of 
Methods % Weight Weight Long Short 



Low Frequency Sample Covariance Matrix Estimator 



c = 1 (No short) 


12.73 


0.50 


-0.00 


8 





c = 2 


14.27 


0.44 


-0.12 


16 


10 


c = 3 


15.12 


0.45 


-0.18 


18 


12 



High Frequency All-Refresh TSCV Covariance Matrix Estimator 



c = 1 (No short) 


12.55 


0.40 


-0.00 


8 





c = 2 


12.36 


0.36 


-0.10 


17 


12 


c = 3 


12.50 


0.36 


-0.10 


17 


12 



High Frequency All-Refresh RK Covariance Matrix Estimator 



c = 1 (No short) 


13.69 


0.22 


-0.00 


14 





c = 2 
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0.25 


-0.15 


17 


10 


c = 3 


16.55 


0.30 


-0.23 


17 
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High Frequency Pairwise-Refresh TSCV Covariance Matrix Estimator 



c = 1 (No short) 


12.54 


0.39 


-0.00 


9 





c = 2 
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0.35 
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17 


12 


c = 3 
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0.35 


-0.08 


17 
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is large. Both facts support our theoretical results and intuitions. Given 10 times the length of 
covariance estimation window, the low frequency approach still cannot perform better than the high 
frequency TSRV approaches, which affirms our belief that the high frequency TSRV approaches 
can significantly shorten the necessary covariance estimation window and capture better the short- 
term time- varying covariation structure (or the "local" covariance). These results, together with 
the ones presented in the simulation section, lend strong support to the above statement. 

Again the fact that the all-refresh RK strategy is outperformed by the low frequency strategy 
could be due to the instability of the estimated realized kernel covariance matrix. 

As the gross exposure constraint increases, the portfolio risk of the low frequency approach 
increases drastically relative to the ones of the high frequency TSRV approaches. The reason could 
be a combination of the fact that the low frequency approach does not produce a well-conditioned 
estimated covariance due to the lack of data and the fact that the low frequency approach can 
only attain the long run covariation but cannot capture well the "local" covariance dynamics. 
The portfolio risk of the high frequency TSRV approaches increased only moderately as the gross 
exposure constraint increases. From financial practitioner's standpoint, that is also one of the 
comparative advantages of high frequency TSRV approaches, which means that investors do not 
need to be much concerned about the choice of the gross exposure constraint while using the high 
frequency TSRV approaches. 
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Figure 8: Out-of-sample performance of 5-day-rebalanced optimal portfolios for Dow Jones 30 
constituent stocks with investment period from May 27, 2008 to Sep 30, 2008 (89 trading days), 
(a) Annualized risk of portfolios, (b) Maximum weight of allocations. 



It can be seen that both the low frequency and high frequency optimal no-short-sale portfolios 
are not diversified enough. Their risk profiles can be improved by relaxing the gross-exposure 
constraint to around c = 1.2, i.e. 10% short positions and 110% long positions are allowed. The 
no-short-sale portfolios under all approaches have the maximum portfolio weight of 22% to 50%. 
As the gross exposure constraint relaxes, the pairwise-refresh TSRV approach has its maximum 
weight reaching the smallest value around 30% to 34% while the low frequency approach goes down 
to only around 40%. That is another comparative advantage of the high frequency approach in 
practice as a portfolio with less weight concentration is always considered more preferable by most 
of the investors. 

Another interesting fact is that the equally weighted daily-rebalanced portfolio of the 30 DJIA 
stocks carries an annualized return of only —10% while DJIA went down 13.5% during the same 
period (May 27, 2008 to Sep 30, 2008), giving an annualized return of -38.3%. The cause of the 
difference is that we intentionally avoided holding portfolios overnight, hence not affected by the 
overnight price jumps. In the turbulent financial market of May to September 2008, that means 
our portfolio strategies are not affected by the numerous sizeable downward jumps. Those jumps 
are mainly caused by the news of distressed economy and corporations. The moves could deviate 
far from what the previously held covariation structure dictates. 



6 Conclusion 



We advocate the portfolio selection with gross-exposure constraint (jFan et al\ . l2008bl ). It is less 
sensitive to the error of covariance estimation and is immune to the noise accumulation. The out- 
of-sample portfolio performance depends on the expected volatility in the holding period. It is at 
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best approximated and the gross-exposure constraints help reducing the error accumulation in the 
approximations. 

Two approaches are proposed for the use of high-frequency data to estimate the integrated 
covariance: "all-refresh" and "pairwise-refresh" methods. The latter retains far more data on 
average and hence estimates more precisely element by element. Yet, the pairwise-refresh estimates 
are typically not positive semi-definite and projections are needed for the convex optimization 
algorithms. The projection distorts somewhat the performance of the pairwise-refresh strategies. 

The use of high frequency financial data increases significantly the available sample size for 
volatility estimation, and hence shortens the time window for estimation, adapts better to local 
covariations. Our theoretical observations are supported by the empirical studies and simulations, 
in which we demonstrate convincingly that the high-frequency based strategies outperform the 
low-frequency based one in general. 

With the gross-exposure constraint, the impact of the size of the candidate pool for portfolio 
allocation is limited. We derive the concentration inequalities to demonstrate this theoretically. 
Simulation and empirical studies also lend further support to it. 



A APPENDIX. Conditions and Proofs 
A.l Conditions 

The following conditions are needed. For simplicity, we state the conditions for integrated covaria- 
tion (Theorem [2|). The conditions for integrated volatility (Theorem [1]) are simply the ones with 
Y = X. 

Condition 1. /ij^^^ = ^jl^P = 0. 

Condition 2. < a^p\ < < oo, Vt G [0, 1]. 

Condition 3. The observation times are independent with the X and Y processes. The synchro- 
nized observation times for the X and Y processes satisfy sup]^<j<^ n- {vj —Vj-i) < Ca < oo, where 
n is the observation frequency and V = {vq, vi, - ■ ■ , Vn} is the set of refresh times of the processes 
X and Y. 

Condition 4. For the TSCV parameters, we consider the case when J = 1 (fij = h) and fix = 
0{h^/^) such that 

2 - ^ - 

Condition 5. The processes and are independent. 
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Conditions 1 and 4 are imposed for simplicity. They can be removed at the expenses of lengthier 
proofs. For a short horizon and high-frequency, whether Condition 1 holds or not has little impact 
on the investment. For estimating integrated volatility, the synchronized time becomes observation 
time {Tn,j} and Condition 3 and 5 becomes 

sup n ■ {TnJ - Tn,j-l) < Ca < OO (26) 
l<i<n 

and 

2 - A - 

A. 2 Lemmas 

We need the following three lemmas for the proof of Theorems [T] and [2j In particular, Lemma [2] 
is exponential type of inequality for any dependent random variables that have a finite moment 
generation function. It is useful for many statistical learning problems. Lemma [3] is a concentration 
inequality for the realized volatility based on discretely observed latent process. 

Lemma 1. When Z ~ iV(0, 1), for any \e\ < \, 

Eexp{e(2'2 - 1)} < exp(202). 

Proof. Using the moment generating function of ~ Xi; we have 

£;exp{0(z2 - 1)} = exp{-^ log(l - 26) - 6]. 

Let g{x) = log(l — x) + x + x^ with < 1/2. Then, g' {x) = x{l — 2x)/(l — x) is nonegative when 
X € [0,1/2] and negative when x G [—1/2,0). In other words, g{x) has a minimum at point 0, 
namely g{x) > for |x| < 1/2. Consequently, for \6\ < 1/4, 

log(l - 26) > -26 - (20)2. 

Hence, 

Eexp{6{Z^ - 1)} < exp(202). 
Lemma 2. For a set of random variables Xi, i = 1, • • • ,K , if when \6\ < Ci, 

Eexp{dXi) < exp(C202), (27) 
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for some two positive constants Ci and C2, then 

K 2 
P{\Y,WiXi\ > x} < 4exp (- — ^-^), when{)<x <2CiC2, 



1=1 

where Wi 's are weights satisfying X^iLi 1^*1 1^ w £ [1, 00). 

Proof. By the Markov inequality, for < < Ci, we have 

P{\Xi\ >x)< exp{-9x)Eexp{e\Xi\) < 2exp(C72e^ - Ox). (28) 

Taking 9 = x/(2C2), we have 

> x} < 2exp(--^), whenO<x<xo, (29) 

where xq = 2C1C2. 

For a smah constant > to be specified later, let 

r exp(^x^) when < x < xq 

g^{x) = <, 

^ exp(a^ + b^x) when x > xq, 

where = — and 6g = 2S,xq. Then g^{x) is a continuously differentiable increasing convex 
function on [0, 00). It follows from the Markov inequality and the convexity that, for w* = l^^l 

K K 



P{\J2wiXi\ > x) < g^{x)-'Egi:{\J2wiXi\) 
i=l i=l 

K 

< g^{x)-^w*-^Y.\'^^\^9dMXi\): (30) 

i=l 

which is further bounded by Ag^{x)~'^ if we can show that 4 is a common bound for {Eg^{w\Xi\)^ . 
Note that by <^ for wh^ < 6 < Ci, 

lim gAx) ■ P{w\Xi\ > x} = 0. 

It follows from the integration by parts that 

Eg^{w\Xi\) =1 + / 2ixe-x.-p{ix^)P{w\Xi\ > x)dx 

r-(X) (31) 

+ / 6^ exp(ag + 6gx)P{w|Xj| > xjdx. 

J xn 
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By ([29]) . the second term in ([3T]) is bounded by 

2^xexp{Cx^)2eM-C3X^)dx = ^r^^ (l " exp{(e - Ca)^^} 
where C3 = {4:C2w'^)~^ . Using (j28p . the third term in (j3ip is bounded by 

/•oo 

26|exp(a^) / exp(6^x + 020'^ — 6w^^x)dx 
= 2b^ex.p{a^ + 6*26'^ + b^xo - Ow'^xq) / {6w~^ - 65), 

provided that wh^ < 6 < Ci. 

Choosing further 6 to satisfy + 6*2^^ + b^XQ — 9w~^xq < and ^ < C3, it fohows from the 
calculation in the previous paragraph that 

E{g^{w\X,\)) < 1 + + 



C3 - C ^-iw-i - b^ ■ 

Taking ^ = ^0 = ^md C = '?o = igclw^ ' '^^i^h satisfy the above conditions, it follows from direct 
calculation that 

^^^o(-l^.l) < 1 + + ^—13^ = y < 4. 

To summarize, from the above, we know that 

Eg^^-,{w\Xi\) < 4 for ah i = 1, • • • ,K. 

Therefore, continued from (j30p . we have 

K 2 
P{\\2w,Xi\ >x)< %o(x)-^ = 4exp{- J; } when < x < 2C1C2. 

This completes the proof of the lemma. 

Lemma 3. (A Concentration Inequality for Realized Volatility Based on Latent Pro- 

cessj For a one dimensional process Xf following ^^ that satisfies Conditions 1-2, when one ob- 
serves Xt at times Vi, i = 1, ■ ■ ■ ,n, and the observation frequency satisfies Condition 3 (see /i26\) ). 
then, for x G [0, c^/n ], 

P^n^/^\[X^]i - ajdt\ > x} < 4exp{-Cx2}, 

where [X,X\i = ^^^liX.^ — X.^^-^)"^ is the realized volatility based on the discretely observed X 
process; the constants c and C can be taken as in jg^p . 
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Proof. Let X;: = Xt — Xn = Tq a.^dW ^s. By time-change for martingales, (see, for example. 



Theorem 4.6 of lKaratzas and Shrevd (120001 ) ). if = inf{s : [X]s > t} where [X]s is the quadratic 
variation process, then Bt := X-^t is a Brownian-motion w.r.t. {J~Tt}o<t<oo- We then have that 

Ee^p[e{X^ - f^^lds)) = i?exp (^(iJ^^j^ - [X]t)). 

Note further that for any t, [X]t is a stopping time w.r.t. {-7v^,}o<s<oo; and the process exp^0(i?g — 

s)^ is a sub-martingale for any 9. By the optional sampling theorem, using [X]u < C^n (bounded 
stopping time), we have 



Ee^v[0{Bl^^^ - [X]u)) < Ee^v{e{B'clu " C^n) ^ 
Therefore, we have that, under Conditions 2 and 3, 

^(exp{0V^((AX,)2 - £ aldt)}\T,^^^ 
< Ee.p{eV^iBl^-^)} 

n 

= ^exp{e^(Z2-l)}, (32) 

where Z ~ A^(0, 1) and AX^ = X^^ - X^^_^. 

It follows from the law of iterated expectations and (j32p that 

^exp{eV^([X^i - f a^dt)} 
Jo 

= E{e^p{9V^{Y^{AX,f - a^dt)} 



i=l 



xE(exp{9V^{AX^- f" ajdt)}\F,^) 

Jvu-l 

< Eexp{9V^{J2{AXi)^ - J aUt)}Eexp\^9^:^^iZ^ 

where Z ~ A^(0, 1). Repeating the process above, we obtain 

Eexp{9V^{[X^]i- J\^dt)} < ^£;exp{^^^(z2-l)}j . 

By Lemma 1, we have for \9\ < j^j^-, 

EGxp{9^{[X^]i - [ ajdt)] < exp{29^CjCl}. 
Jo 



(33) 
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By Lemma 2, we have, 

p{ni/2|[x,X]i - a^dt\ >x]< 4exp{-^^^^}, (34) 
when < X < C^CA^/n. 



A. 3 Proof of Theorem 1 

We first prove the results conditional on the set of observation times V. Recall notation introduced 
in sections 13.21 and 13. 3i Let n be the observation frequency. For simplicity of notation, without 

(X) 

ambiguity, we will write Tn^i as Tj and al as at. Denote the TSRV based on the unobserved latent 
process by 

(K) ^^^{K) f,^ (J) 

=\X,X\ -JL\x,X\^ , (35) 

— ^ (K) 

where [X, X]^ = 'YA=Ki'^n ~ -^n^x)'^- Then, from the definition, we have, 

, . (K) (K) (K) 

{X,X), = [X,X], +[e^,e^], +2[X,e^], 



^^^(J) {J) ^ — (J)\ 

-X,X], +[e^,e^], +2[X,e^], J 



nj 

J_ V'\/W _ ^^^^^^^^ 
K nj 



y^V^^^ -^[X,X\, +R1+R2, (36) 



=0 

(K) _ (1) {K) _ (1) 

wherefli = [6^,6^]i - |^[e^, 6^]^ , i?2 = 2[X, e^]i - 2|^[X, e^J^ , and 

flK 

Vk = E(^-.K+< - for / = 0, 1, • • • , - L 

i=i 

Note that we have assumed that nx = integer above, to simplify the presentation. 

Recall that we consider the case when J = 1, or nj = n. We are interested in 



Jo 

1=0 •'^ 

,3/2 ^1 



(37) 
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The key idea is to compute the moment generating functions for each terms in ()37p and then to 
use Lemma [2] to conclude. 

For the first term in (|37|) . since V^^^ is a reahzed volatility based on discretely observed X 
process, with observation frequency satisfying supi<j<^^ nx • {tik+i — T(^i-i)K+i) < C'a, we have, 
by ([33]) in Lemma 3, for \6\ < 



Jo 



}• (38) 



For the second term in (j37p . we have obtained in (j33p that 



Eexp{0V^{\X^{^^ - [ at'^dt)} < exp{29^ Cjcl} , when \e\ < (39) 



The third term in (j37p can be ignored because it has an upper bound that goes to zero sufficiently 
fast as n grows: 

-3/2 »i 

/ at^dt < (40) 



n Jo 
by Condition 5. 

We introduce an auxiliary sequence an that grows with n in a moderate rate to facilitate our 
presentation in the following. In particular, we can set a„ = n^/^^. 
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Let us now deal with the fourth term in (j37p . Note that from the definition 



K 



ei-ei-K) 2^(ei-ei_i) 



i=l 



i=l 



K 

JnK\/n — K + I 
K 

yW^V K - Ian 



^/n- K + 



i=K 



1 



K-l 



K 



i=i 



n-l 



UkVK - Ittn 1 Sr^ f 2 2 ^ 



/liF(i^-l)a„ 1 



1=1 



n-l 



/n]^{K - l)a„ 1 2 2 N 



(41) 



The first two terms in (I4ip are not the sum of independent variables. But they can be decom- 
posed into the sum of independent random variables and the moment generating functions can be 
computed. To simplify the argument without losing the essential ingredient, let us focus on the 
first term of (j^T]) . It can be decomposed as 

n 

^ejej-i = ^ eiei-i+ ^ EiSi-i 
«=i odd i even i 

and the summands in each terms of the right-hand side are now independent. Therefore, we need 
only to calculate the moment generating function of ei£i-i. 

For two independent normally distributed random variables X ~ N{Q,a\) and Y ~ A^(0,(7y), 
it can easily be computed that 

< exp{(T^cTy0^/n} when \6\ < 



n 



\f2axOY ' 

where we have used the fact that log(l — x) > — 2x when < x < ^. 
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Hence, by the independence, it follows that (we assume n is even to simplify the presentation) 

n/4 



odd i 



(42) 



< exp{2r?^r}, when \9\ < 



The second term in Ri works similarly and have the same bound. For example, when nx is 
even, one can have the following decomposation 

n nK/2 2jK-l nK/22jK+K-l 

i=K j=\ i=2jK-K j=\ i=2jK 



The last four terms are sums of independent x^-distributed random variables and their moment 

_\_ 

^/K~- 



generating functions can easily be bounded by using Lemma 1. Taking the term ^ Jk-i Si=i^(^i ~ 



rjj^) for example, we have 

K-l 



^(^"P { ,l_, Y.^'"- ^x)}) < eM^v'xOVal} when 1^1 < ""^^^ 



For the term i?2, we have, 
where AXj = X^-. — and A^^^Xj = X^-. — X^-..^^. The first term above satisfies 



(43) 



£;(exp{-^ AX,ei}) = £;(exp{^(-AX,)2r7i/2}) 

1=1 1=1 

< (i?(exp{e%l,C2CAZV2na2}))" 

1 \ «/2 



(44) 



< exp{r/|C2CAeV«n}, when \Q\ < 



where in the second line we have again used the optional sampling theorem and law of iterated 
expectations as in the derivations of Lemma 3; Z denotes a standard normal random variable. The 
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second term in R2 works similarly and has the same bound. For the third term, we have 



E 



--E 



--E 



exp 



a — 



i=K 



i^(exp {^li^^ ^ AWx,e,}|X process) 

i=K 

2q2- 

1=0 j=i 



^{i.[e.p{^i^£(A(-)x.)^}]} 



i=i 



,2/32^ ^2 



(45) 



<exp {^^i^l^C^CA} when 1^1 < ^ ^ , 

where we have used the Holder's inequality above. The forth term works similarly and has the 
same bound. 

Combining the results for all the terms (j38p - (j45p together, applying Lemma[2]to (j37p . we have, 
for the following set of parameters, the conditions for Lemma 2 are satisfied with Ci = Ci^x\/^1<- 



1 



1 y/n/riK an^J{K - l)/nK a„ y/n/riK \fK^ 



'AanVCa J 



4C2Ca 



for big enough n. 



(46) 



and 



C2 = m..{2C'^Ci,2rj%,2v\/alvlC'Mal ^^^C'M 



K 



max{2C^C^, 2//^} for big enough n 



2C^C\ considering the values Ca > 1, > rjx typically. 



w =14 = [2 + 8\/2] 
1 

> 



K 



1=0 



y 1 + (^)3/2 



coefficients in the first two terms of i[57t 
^ 4:^/WK^/n ^ 2\JnKCin ^ 2y/nKCLn ^ Aannx _^ 4 



K 



n n an 
✓ ^ 



controls coefficients in gT} coefficients in i|43| 

where the > is valid when n is big enough and Condition 5 is applied. 



(47) 
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By Lemma 2, when < x < 2C\^xC2^fnK^ 

- I at^dt\ >x}< 4exp(-(16C2W^)"^x^). 

By the Condition 5 again, we have 

P{n^/^\{'X^). - [ at^dt\ > x} <P{./fi^\{X^) i - f (^t^dt] > x/V2) 

Jo Jo (48) 

<4exp(-Cx^), when < X < cn^/^, 

where 

c = 2Ci,^C2 and C = {32C2w'^y^. (49) 

This completes the proof of the result conditional on the observation times. Theorem 1 is proved 
by noting that this conditional result depend only on the observation frequency n and not on the 
locations of the observation times as long as the Condition 3 is satisfied. 

Note also that in the above proof, we have demonstrated by using a sequence that goes to 
oo at a moderate rate that, one can eliminate the impact of the small order terms on the choices 
of the constants, as long as the terms have their moment generating functions satisfy inequalities 
of form (j27p . We will use this technique again in the next subsection. 



A. 4 Proof of Theorem 2 

We again conduct all the analysis assuming the observation times are given. Our final result holds 
because the conditional result doesn't depend on the locations of the observation times as long as 
the Condition 3 is satisfied. 

Recall notation for the observation times as introduced in section 3.2. Define 

Z+ = X + y and =X -Y. 

and Z~ are diffusion processes with bounded volatility. To see this, let and W' be 
processes such that 



and 



dWi 



■ W\2 , ( C^)\2 , o (^) (J) 



afUBf^-aPdB^ 



'(af))2 + (cxf))2-2,,.f).r) 
and W~ are standard Brownian motions by Levy's characterization of Brownian motion (see, 
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for example, Theorem 3.16 of lKaratzas and Shrevd (|200d )). Write 



and 



we have 



dZ+ = af dW^ and dZ' = af dW; 



with < fjf^, erf <2C„. 

For the observed and processes, we have 

= XI + = Z+ + ei,+ and Z"'" = X^^ - = Z" + e, 

where ti and Sj are the last ticks at or before Vi and 



ej,+ — - Xy. + Ys- - Yy. + ef + ef , 
ej,- = -'^ti - Xy. - Ys, + Yy. + e: 



Note that 



(x,y), = -((z+,z+),-(z-,z-)i). 



We can first prove analogues results as Theorem 1 for and {Z ,Z )]^, then utilize the 

results to obtain the final conclusion for TSCV. 

For (Z+, Z+)^, the derivation is different from that of Theorem 1 only for the terms that involve 
the noise, namely ^/nKRl and ■^/n^i?2. Write AXi = Xt- — X^- and AYi = Yg- — Yy.. Then, we 
have, the first term in y/WxRi becomes 



K 



n ^ 

1=1 



K 



1=1 



+ (6f + 6r)Ay,_i + + 6r)(ef„i + 6r_i) ; 



The only Op(l) term is the last term, which involves only independent normals, and can be dealt 
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with by the same way as before (again assume n is even for the simphcity of presentation below): 

odd i 

even i 
1 \ 



,l-4(4+r/2,)202/n 
<exp{2(r/| +ri'^ fe^}, when \e\ < 



The other terms are of a smaller order of magnitude. By applying an Un sequence which grows 
moderately with n as in the proof of Theorem 1 (we can set Un = n^^^"^), we can see easily that 
their exact bounds don't have effect on our choice of Ci, C2 or u. All we need to show is that 
the moment generating functions of these terms can indeed be suitably bounded as ()27p . To show 
this, first note that, for any positive number a and real valued b, by the optional sampling theorem 
(applied to sub-martingales exp(ai?g) and exp{bAyBs) with stopping time [X]u < C^u for real 
number Ay), we have, 

^(exp{a(AXi)2}|J-i_i) < (^( expjaC^CA^V^})) forZ~7V(0,l) 

1/2 (50) 



1 - 2aC^CA/n 



where J-i is the information collected up to time Vi. Inequality (I50|) holds when AXi is replaced by 
AYi. Similarly, 



(51) 



E{ exp{bAXiAYi-i}\T,-2] < Ef^E{exp{bAXiAYi-i}\Ti-i)\Ti-2 

< i?(exp{62CAC2(Ay,_i)V2n}|-F.-2 

1/2 



1 

< 



1 - 62C74C2 /^2 



The inequalities ([50]) and ([5T]) can be used to obtain the bounds we need. For example, by ([5T 
and the law of iterated expectations, 



odd i 

<exp{e^C^Ci/2n} when \9\ < 



n 
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by independence, normality of the noise, the law of iterated expectations and (|50p . we have 



i5;(exp{-^AX,(6f_, + 6r_i)} 



1=1 



< 



E(exp{5](£-AX,)2(7?i+77^)/2} 



1=1 



l-{v\ + vl)0^ClCr,/na: 
<exp{(?7i +ryf.)C2CA^VaI}, when \e\ < 

Similar results can be found for the other terms above, with the same techniques. 

The second term in ^/nxRi works similarly and have the same bound. The other terms in 
y/nxRi and the whole term of \/nxi?2 are of order op(l). Again, by using a sequence we can 
conclude immediately that their exact bounds won't matter in our choice of the constants and we 
only need to show that their moment generating functions are appropriately bounded as (j27p . The 
arguments needed to prove the inequalities of form (j27p for each elements in these terms are similar 
to those presented in the above proofs, and are omitted here. 

Hence, by still letting if = 14 and redefining 

C2 = max{2(2C,)4ci, 2(r?i + rj'yf} 

= 32C^Ci for the typical case when > rix,'r]Y, 

we have, when < x < dv}^^ , 

P{fi^'^\{Z+,Z+)^ - / af dt\ > x} <4exp(-CV), 
Jo 

rl 2 

and P{n^/^|(Z-,Z-)^ - / af dt\ > x} <4exp(-CV), 

Jo 

where 

c' = 2C7i,^C2 and C = {32C2W^)-\ 
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Finally, for the TSCV estimator, when < x < cn-*^/^, 

P{h'/'\{XY), - f af'^aPpf'''ht\ > x} 
Jo 

<P{nV6|(z+,Z+)i - / af^ dt\ > 2x} 
Jo 

- rl 2 

+ P{n^/^\{Z-,Z-)^~ af dt\>2x} 

Jo 

<8exp{-Cx^), 

where 

c = c'/2 = Ci,^C2 and C = 4C' = (SCau;^)"^ (52) 

This completes the proof. 

Note that the argument is not restricted to TSCV based on the pairwisc refresh times - it 
works the same (only with h replaced by n*, the observation frequency of the all-refresh method) 
for the case when the synchronization scheme is chosen to be the all-refresh method, as long as the 
sampling conditions Condition 3-4 are satisfied. 
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